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1.   Introduction 

We  have  advocated  elsewhere  [7,8]   a  model- -called  the  Syntactic  Model- - 
for  the  analysis  and  description  of  pictures.  This  model  was  originally 
developed  to  answer  a  very  specific  problem,  namely,  that  of  identifying 
patterns  that  occur  in  bubble  chamber  negatives.   Towards  this  end,  a  variety 
of  specific  algorithms  have  been  developed  and  a  computer  program  is  currently 
being  written  to  implement  the  model  in  this  case.  A  novel,  parallel- 
processing  computer- -called  the  Pattern  Articulation  Unit--is  also  under 
fabrication  to  carry  out  the  specific  computations  involved  in  the  algorithms 
in  an  especially  efficient  manner. 

The  Syntactic  Model  itself  is,  however,  much  more  general  in  scope. 
Briefly,  it  is  a  descriptive  scheme  based  on  assigning  a  hierarchic  system  of 
labels  to  the  points  which  make  up  the  picture.    The  labeling  algorithms 
make  implicit  use  of  the  underlying  syntax  which  characterize  the  class  of 
pictures  being  described.   Central  to  the  descriptive  scheme,  then,  is  the 
notion  of  a  class  of  pictures  and  a  set  of  grammar  rules  which  characterize  the 
syntax  of  the  patterns  that  occur  in  the  pictures  of  that  class. 

For  example,  in  the  specific  context,  cited  earlier,  of  bubble  chamber 
negatives,  the  class  sought  to  be  described  is  the  class  of  pictures  composed 
of  line-like  elements  (or,  more  informally,  pictures  which  look  like  road  maps). 
The  particular  grammar  rules  are  to  a  large  extent  specified  by  the  underlying 
physical  process  which  gives  rise  to  these  pictures.  Another  useful  class  of 
pictures  for  study  would  be  the  class  of  hand -printed  letters  (or,  more 
generally,  hand-printed  alpha-numeric  characters).  Although  these  look  like 
road  maps  too,  clearly,  the  grammar  rules  for  their  efficient  description  need 
not  be  the  same  as  in  the  previous  instance.   However,  because  of  this  basic 
similarity  in  appearance,  a  large  number  of  the  labeling  algorithms  devised  for 
the  earlier  class  can  be  equally  well  applied  to  the  class  of  hand-printed  letters 


The  PAU  and  the  associated  computer  are  being  designed  and  built 
in  this  Laboratory  by  a  group  led  by  Dr.  B.  H.  McCormick. 

See  below  in  Section  2  for  a  more  detailed  description  of  this  model, 
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A  third  familiar  class  of  pictures  can  be  defined  as  follows:  each 
picture  is  composed  of  a  configuration  of  two-dimensional  fields,  the  boundaries 
of  each  field  being  some  well-defined  geometric  figure,  say,  a  circle,  square, 
triangle,  other  types  of  polygons,  etc.  A  generalization  of  this  class  would 
permit  the  "interiors"  of  the  individual  fields  to  be  either  all  black  or  all 
white.  A  straightforward  extension  of  the  labeling  algorithms  set  up  for  the 
road-map-like  pictures  can  be  made  to  apply  to  this  wider  class  also. 

Our  principal  concern,  however,  here,  is  not  with  the  construction 
of  algorithms  for  application  to  specific  classes  of  pictures.  Rather,  our 
interest  is  in  a  meta-theoretic  study  of  the  potentialities  of  descriptive 
schemata  such  as  the  one  referred  to  earlier,  in  modeling  the  process  of  visual 
perception.  As  a  starting  point  towards  such  a  study,  we  shall  consider  in 
the  following  pages  the  ability  of  the  Syntactic  Model  to  cope  with  some  of 
the  phenomeno logical  features  which  have  been  emphasized  by  the  Gestalt 
school  of  psychologists  as  pre-eminently  characterizing  the  visual  perception  of 
data.   In  particular,  we  shall  examine  two  such  features --both  extensively 
studied  and  discussed  in  the  literature:   (l)  the  "spontaneous"  organization  of 
entities  in  the  visual  field  into  "wholes  and  subwholes"  according  to  qualitatively 
discernible  principles;  (2)  the  occurrence  of  ambiguous  (sometimes  also  called 
reversible) figures  and  the  visual  phenomena  associated  with  them. 

It  must  be  emphasized  here  that  our  concern  at  this  stage  is  a  well 
circumscribed  one:   attempting  to  characterize  in  a  purely  functional  manner  the 
above  Gestalt  features  within  the  framework  of  a  coherent  descriptive  scheme 
for  processing  visual  data.   In  the  context  of  the  Syntactic  Model  (for 
descriptions  of  classes  of  pictures)  mentioned  earlier  in  this  section,  we  shall 
formulate  this  problem  as  follows:   Is  it  possible  to  extend  the  labeling  schemata 
in  a  natural  way  so  as  to  incorporate  in  the  resulting  descriptions  of  pictures 
the  same  kinds  of  organization  of  data  that  are  characteristic  of  the  visual 
process?  Moreover,  can  the  visual  phenomena  associated  with  ambiguous  figures 
be  characterized  within  the  framework  of  this  extended  Syntactic  Model  in  a  simple, 
intuitively  meaningful  way? 
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An  affirmative  answer  to  both  these  questions  would,  in  effect,  imply 
that  a  computer  program  processing  visual  data  according  to  this  model  would, 
anthropomorphically  speaking,  "see"  the  data  organized  (and/or  ambiguous) 
exactly  as  required  by  the  Gestalt  principles.  We  shall  outline  in  the  sequel 
a  specific  extension  to  our  basic  Syntactic  Model  and  exhibit  a  set  of  computer- 
processed  outputs  based  on  this  extension  which  bear  out  the  above  assertion. 

Before  doing  this,  however,  it  is  first  necessary  to  describe  in 
greater  detail  the  basic  Syntactic  Model  as  well  as  the  Gestalt  features  one  is 
trying  to  characterize.  Sections  2  and  3  are  concerned  with  these  matters. 
In  Section  k   we  describe  our  extended  model.  A  few  computer-processed  output 
pictures  exhibiting  the  above  Gestalt  features  are  given  in  Appendix  1. 
Section  5  is  concerned  with  a.  discussion  of  the  ambiguous  figures  and  of  how 
the  visual  phenomena  associated  with  them  follow  as  a  natural  corollary  to  the 
processing  details  implicit  in  the  Syntactic  Model.   In  the  last  section  we 
shall  consider  some  of  the  implications  of  the  Syntactic  Model  and  their 
relevance  to  known  empirical  results  connected  with  visual  perception.  We  shall 
also  draw  attention  to  a  few  problems  explicitly  suggested  by  the  model  which 
seem  worth  experimental  investigation. 


2.   The  Basic  Syntactic  Model 

For  the  sake  of  definiteness,  let  us  restrict  our  consideration  to 
pictures  consisting  of  black  and  white  points.  Figure  1  illustrates  a  picture 
of  this  class.   The  Syntactic  Model  would  now  seek  to  describe  this  picture  by 
assigning  a  hierarchic  system  of  labels  to  every  point  in  it.  The  labels 
assigned  form  a  hierarchy  in  the  following  sense:   the  procedure  for  labeling 
divides  into  a  series  of  well-defined  levels.  At  each  level,  the  labeled  outputs 
from  the  lower  levels  serve  as  the  input  to  the  current  level  of  labeling.  What 
particular  labels  are  assigned  at  each  level  would,  of  course,  depend  on  the 
particular  class  of  pictures  being  (or  sought  to  be)  described. 
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As  a  specific  example,  consider  the  picture  illustrated  in  Fig.  1 
again  and  let  the  syntax  used  be  that  pertaining  to  the  description  of  the  class 
of  hand-printed  letters.  At  the  lowest  level,  each  point  is  assigned  one  of 
the  labels,  black  or  white.   In  the  next  level  all  black  points  connected 
together  are  assigned  a  distinct  connectivity  label.   At  the  third  level, 
labels  are  assigned  to  each  point  to  signify  what  type  of  primary  road  it  forms 
a  part  of;  (The  four  primary  roads  are  North-South,  East -West,  Right  Diagonal, 
Lef t -Diagonal. )  Using  the  road  segments  so  formed  and  grammar  rules  defining 
the  syntax,  at  the  next  level,  higher  order  phrases   are  formed.   The  names 
of  these  phrases  are  now  assigned  to  the  corresponding  points  as  labels.  This 
process  is  continued  till,  at  the  highest  level,  the  names  of  the  phrases  are, 
respectively,  the  names  of  the  letters.   These  are  thus  assigned  as  the  highest 
level  labels  to  the  points  constituting  the  respective  letters. 

To  illustrate  how  this  labeling  scheme  would  work,  consider  the  three 

points  labeled  P,  Q,  R  in  Fig.  1.   In  the  table  below  are  listed  the  various 

labels  assigned  to  the  three  points  at  the  different  levels.   In  writing  down 

the  table  entries,  the  following  notation  has  been  used:   I.,  (j  =  1,2,...)  denotes 

J 
the  connectivity  labels,  all  points  connected  together  being  assigned  the  same 

label,  I._  .  The  road  labels  used  are:   N  (North -South):  E  (East-West): 
'   jl  "  ' 

A  (Right  Diagonal);  B  (Left  Diagonal). 
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We  have  tacitly  assumed  here  that  the  "figure"  in  the  field 
is  defined  by  the  black  points.  There  is,  of  course, 
no  justification  for  this  assumption.  The  figure  could 
as  well  be  defined  by  the  white  points.   In  certain 
circumstances,  both  sets  of  points  could  simultaneously 
define  figures  independently,  thus  giving  rise  to 
ambiguities  in  further  processing.   This  is  a  crucial 
point  and  we  shall  return  to  it  again  in  Section  5  below 
where  we  discuss  ambiguous  pictures. 

The  name  'phrase'  is  used  merely  as  a  suggestive  terminology. 

Phrases  at  all  levels  need  not  be  linear  strings;  in  fact, 
they  will  not  be.   Distinct  phrases  should  be  thought 
of  as  representing  certain  well-defined  'primitive' 
graphs .  ( See  next  footnote . ) 
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Q  R  Remarks 


Level  1:  1  1  1  Black  =  1 

White  =  0 

Level  2:  I  I  I  Connectivity 

Level  3:  A  N  A  Names  of 

primary  roads 


Phrase  names 


Last  Level:  ABA  Names  of 

letters 


In  the  case  of  hand-printed  English  alphabets,  these  intermediate  phrase 

levels  may,  of  course,  be  empty,  or  they  may  exist  for  some  letters  and 

not  for  others.   These  details,  which  are  for  the  most  part  optional, 

do  not  play  anything  more  than  an  incidental  role  in  processing  particular 

pictures. 

The  notion  of  phrases,  which  represent  intermediate  level  groupings,  is, 
however,  of  some  importance.   Its  significance  and  plausibility  can 
perhaps  be  made  clear  by  means  of  the  following  example:   Consider  an 
alphabet  consisting  of  the  five  characters  shown  below: 


□         H  0  Q     /    ^    V 

Let  their  names  be  SI,  S2,  S3,  S4  and  S5,   Let  G    /    \.     /\ 
be  referred  to  as  a  square,  a  right  slash,  a  left  slash  and  a  cross. 
Then,  clearly,  an  efficient  description  scheme  for  the  five  characters  is 
to  say  that  SI  consists  of  a  square,  S2  of  a  square  and  a  dot,  S3  of  a 
square  and  a  right  slash,  Sk   of  a  square  and  a  left  slash,  S5  of  a 
square  and  a,  cross.   The  square,  the  right  and  left  sla.shes  and  the  cross, 
which  are  groupings  of  the  primary  roads  N,  E,  A,  B  are  examples  of 
phrases.   It  is  evident  that  their  introduction  simplifies  the  description 
of  this  particular  class  of  pictures.   It  is  our  view  that  in  the 
context  of  visual  processing  of  data,,  although,  in  many  cases,  these 
phrase  structures  may  be  arbitrary  (i.e.,  optional),  in  some  cases  they 
may  be  obligatory.   In  the  latter  event,  the  subgroupings  should  be 
actually  perceived  as  such. 

Labeling  of  the  type  indicated  in  the  table  is  most  readily  realized  with 
the  help  of  the  Pattern  Articulation  Unit  referred  to  earlier.   Extensive 
work  using  hierarchic  labeling  algorithms  has  been  done  with  a  parallel 
processing  simulator  that  has  been  written  for  the  IBM-7090  computer 
[9] •   Some  processed  outputs  using  this  simulator  are  given  below  in 
Appendix  1.   For  a  more  detailed  description  of  labeling  schemata  of  the 
type  exemplified  in  the  table,  see  [8] . 
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Such  a  scheme  of  labels  assigned  to  each  point  in  the  picture 
constitutes  a  complete  description  of  the  picture- -for  example,  that  it 
consists  of  three  distinct  (i.e.,  disjoint)  hand-printed  letters,  namely 
two  A' s  and  a  B.   Clearly,  even  at  a  more  local  level,  the  labeling  scheme 
provides  a  complete  set  of  intuitively  significant  descriptions.   Considering 
the  three  points  P,  Q,  R,  although  they  are  all  disjoint,  the  labels  assigned 
show  that  P  and  R  are  similarly  situated  in  a  certain  intuitively  meaningful 
sense  and  that  both  of  them  differ  in  this  respect  from  A.   It  is  also  worth 
noting  that  the  processing  based  on  labeling  schemata,  as  indicated  above,  is 
intrinsically  independent  of  the  positioning  of  the  picture  in  the  'visual' 
field.   It  is  also  independent  of  the  size  of  the  total  picture  so  long  as 
this  size  is  not  too  small  for  the  resolution  of  the  underlying  mosaic  of 
points.  By  incorporating  equivalence  relationships  in  the  grammar  rules, 
recognition  of  figures  in  the  field  can  also  be  made,  to  a,  large  extent,  to 
be  independent  of  their  orientation.  Thus,  "transposition,"  one  of  the 
central  problems  in  modeling  visual  perception, ,  is  taken  care  of  in  a  basic 
way  by  labeling  schemata. 


3.   The  Gestalt -Qualita't  of  Visual  Perception 

It  is  a  matter  of  common  experience  that  the  visual  field  is  not  a 
chaotic  patchwork  of  various  colors  and  brightness  but  consists  of  structured 
units,  certain  areas  belonging  together  and  forming  shaped  regions  distinguishably 
segregated  from  other  areas.   One  can  give  a  variety  of  examples  to  demonstrate 
this  tendency  for  spontaneous  organization  of  visual  data:   perhaps  the  best 
known  is  that  associated  with  a  large  number  of  dots,  lines,  squares,  etc.,  which, 
apparently  distributed  at  random  in  the  visual  field,  take  on  various  phenomenal 
groupings.   It  was  emphasized  originally  by  the  Gestalt  psychologists  that  this 
tendency  for  spontaneous  organization  is  an  intrinsic  characteristic  of  the  visual 
process  and  requires  investigation  and  study.  A  comprehensive  exposition  of 
this  view  may  be  found,  for  instance,  in  Kohler's  book  on  Gestalt  psychology  [5]. 
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Our  immediate  interest  here  is  not  in  the  Gestalt  theory  per  se  or  in 
its  wider  psychological  implications  "but  in  the  so-called  "principles  of 
organization"  first  enunciated  by  Wertheimer.   (A  somewhat  condensed  translation 
of  his  original  paper  may  be  found  in  [l];  see  also  Dember  [4].)  After  drawing 
attention  to  the  fact  that  a  collection  of  figures  in  the  visual  field  is, 
perceptually,  more  than  chaotic  and  unstructured,  he  pointed  out  that  the  way 
in  which  the  parts  are  seen,  in  which  groupings  occur  and  subwholes  emerge,  is 
not  completely  arbitrary  but  is  subject  to  well-discernible,  albeit  qualitative, 
organizing  principles.   He  characterized  several  properties  of  figures  in  terms 
of  which  the  resulting  organization  could  be  described.   Of  these,  for  our 
present  purpose  (since  we  are  not  considering  either  movement  or  depth  vision), 
the  properties  of  major  significance  are  proximity,  similarity,  good  continuation 
and  closure.  We  shall  describe  briefly  the  import  of  each  one  of  these  by  means 
of  suitable  examples.   (Figures  2,  3,    ^f    5  are  taken  from  Dember  [^].) 

Proximity:   Other  factors  being  constant,  subgroups  tend  to  be  formed 
from  parts  which  are  spatially  close  together.   Such  immediate  organization 
based  on  the  proximity  of  elements,  moreover,  tends  to  be  highly  stable. 
This  is  illustrated  in  Figs.  2a  and  2b.   In  2a,  the  squares  are  perceived 
as  organized  into  horizontal  lines;  in  2b,  the  same  elements  are  seen 
organized  into  vertical  lines. 

Similarity:  Where  elements  are  dissimilar,  organization  will  be 
determined  by  similarity  relationships  among  the  elements.   In  Fig.  3> 
for  instance,  although  the  squares  form  an  equally  spaced  grid,  the 
dark  squares  and  the  light  ones  are  seen  organized  into  vertical  lines. 
Under  certain  circumstances,  similarity  might  play  a  more  decisive  role 
in  the  resulting  organization  than  proximity. 
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Good  Continuation;   This  principle  is  concerned  with  the  manner 
in  which  an  existing  organization  tends  to  affect  its  extension  to 
newly  introduced  elements  in  the  field.  Referring  to  Fig.  4.,  for 
example,  given  the  organization  of  the  shaded  circles,  to  extrapolate, 
the  principle  of  good  continuation  would  demand  choice  of  c  rather 
than  of  a  or  b  although  all  the  three  are  equally  close  to  the  last 
shaded  circle.   In  fact,  even  if  c  were  missing,  the  choise  would  he 
on  d  rather  than  a  or  b. 

Closure:   This  organizing  principle  is  illustrated  in  Fig.  5-  The 
dots  are  seen  as  forming  two  separate  enclosed  regions  rather  than  as  one 
smooth  figure-of-eight  curve. 

It  must  be  emphasized  that,  in  all  the  foregoing,  the  claim  is  not  that 
those  indicated  are  the  only  organizations  perceived.  The  point  made  is  that  these 
organizations  occur,  so  to  speak,  spontaneously  and  remain  stable;  to  "see"  any 
other  organization  requires  great  effort  and  despite  this  the  resulting  figure 
tends  to  be  unstable. 


k.      The  Extended  Syntactic  Model 

Before  explaining  the  extension,  we  have  to  introduce  an  operation  on 
pictures  which  we  shall  term  'smearing.'   By  smearing,  we  mean  a  local  spatial 
extension  of  the  set  of  points  which  constitute  a  picture.  This  operation  is 
most  simply  realized  as  follows:   replace  each  point  in  the  picture  by  a 
neighborhood  of  points,  say,  by  a  circular  area  with  the  original  point  as  the 
center.  The  resulting  picture  will  clearly  be  a  certain  "fuzzy"  enlargement  of 
the  original  input  picture.   Figure  6  illustrates  this  smearing  operation  carried 
out  on  some  representative  pictures. 
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FIGURE    6.  a 


FIGURE    6.b 
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FIGURE    6'     EXAMPLES  OF  SMEARING  OPERATION 

(SMEARED  AREA  SHOWN  LIGHTLY  SHADED) 


-11- 


UNIVERSITY  OF 
ILLINOIS  LIBRARY 


To  be  sure,  for  actual  implementation  in  a  processing  algorithm,  this 
definition  has  to  be  made  more  precise.   It  must  be  specified  in  quantitative 
detail  to  what  extent  (i.e.,  how  much  in  spatial  area)  each  point  is  to  be 
smeared.  However,  we  shall  not  attempt  any  such  quantitative  characterization 
at  this  stage  for  two  good  reasons:   (l)  we  do  not  have  any  plausible  criterion 
at  present  on  which  to  base  such  a  quantitative  measure;  (2)   for  the  purposes  of 
our  present  exposition,  all  that  we  need  are  the  qualitative  implications  of  such 
a  smearing  operation,  as  we  shall  see  presently. 

The  basic  Syntactic  Model,  it  will  be  recalled,  consisted  in  assigning 
a  hierarchic  system  of  labels  to  each  point  in  the  picture.  These  labels  were 
to  be  assigned  in  a  well-defined  order,  as  described  earlier  in  Section  2: 
first  the  labels  black,  white;  next  the  connectivity  labels;  then  in  increasing 
order  the  phrase  names  till  the  highest  level  (as  defined  by  the  grammar  for  the 
class  under  consideration)  is  reached.  At  this  stage,  the  original  picture  can 
be  decomposed  into  subpictures,  each  subpicture  consisting  of  all  those  points 
which  have  been  assigned  the  same  highest  level  label.  Visualize  these  sub- 
pictures  as  being  present  on  separate  replicas  of  the  original  field*  The 
extension  to  the  model  now  consists  in  iterating  this  labeling  scheme  independently 
on  the  subpictures,  after  smearing  them.  The  process  ends  when  on  two  successive 
cycles  the  same  highest  level  labels  are  assigned  to  the  points  of  the  original 
picture  (i.e.,  the  first  input  picture).   Schematically,  then,  the  extended  model 
can  be  represented  as  shown  below: 
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In  the  very  first  cycle  there  is  only  one  subpicture,  namely,  the  input 
picture. 

Although  the  model  seems  complex  in  terms  of  the  description  given 
above,  conceptually  it  is  really  very  simple  and  straightforward,  given  the  basic 
hierarchical  labeling  scheme.  The  only  new  aspects  to  the  model  are  its  recursive 
characterization  and  the  operation  of  smearing.  The  crucial  point  to  bear  in 
mind  is  that  smearing  is  restricted  to  the  collection  of  points  with  the  same 
highest  level  label  in  each  cycle.   It  might  be  wondered  that  the  processing 
suggested  by  the  model  could  become  very  complicated  in  cases  where  several 
subpictures  are  generated  in  each  cycle.   For  instance,  if  in  each  cycle,  each 
subpicture  decomposes  into  two  subpictures,  the  number  of  independent  subpictures 
to  be  labeled  would  go  up  exponentially.   This  is  true  in  principle.   In  fact, 
this  observation  suggests  some  new  problems  for  study  concerning  the  visual  process 
which  one  might  not  have  thought  of  a  priori.  We  shall  return  to  these  later 
in  this  paper.   For  the  moment,  however,  it  is  sufficient  to  note  that  in 
practice,  for  instance  in  all  the  examples  illustrated  in  the  previous  section, 
the  iteration  terminates  after  the  second  cycle  and  also  only  one  subpicture  is 
labeled  in  each  cycle  (i.e.,  only  one  subpicture  forms  the  figure;  the  other 
constitutes  the  'ground'  and  so  is  ignored). 

It  is  not  difficult  to  verify  that  an  iterative,  hierarchical  labeling 
carried  out  as  suggested  by  the  model  will,  in  effect,  result  in  organizations 
conforming  to  the  Gestalt  principles.   Smearing,  being  a  purely  local,  neighborhood 
operation,  will  clearly  be  influenced  by  proximity  considerations.   The 
separation  into  subpictures  conforms  to  the  similarity  principle.   In  fact,  it 
is  plausible  to  argue  that,  at  every  level,  an  operational  meaning  for  similarity 
is  the  assignment  of  common  labels.   If  we  recall  now  that  a  picture  is  a  set  of 
labeled  points,  it  is  readily  seen  that,  at  each  cycle,  the  processing  implied 
by  the  model  amounts  to  the  following:   Step  1.   decompose  the  picture  into 
similar  subpictures;  Step  2.   form  pictures  according  to  proximity. 
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In  the  language  of  the  Syntactic  Model,  the  principle  of  good  continuation 
would  he  formulated  in  terms  of  phrase  structures .  Qualitatively  one  -would  say 
that  given  a  picture,  new  elements  introduced  into  the  field  tend  to  be  incorporated 
into  the  picture  so  as  to  preserve  the  underlying  phrase  structure.   This  is 
naturally  hound  to  be  a  very  context-dependent  operation.   It  seems  quite 
plausible  that  at  least  a  good  part  of  the  closure  phenomenon  is  characterizable 
on  similar  considerations.  However,  other  psychological  factors  having  to  do 
with  the  stability  in  the  visual  field  of  simply  connected  domains  might  play  a 
determining  role.  We  do  not  wish  to  prejudge  the  issue  at  this  stage. 

To  demonstrate  the  algorithmic  feasibility  of  the  model,  we  give  in 
Appendix  1  some  actual  computer  outputs.   These  have  been  obtained  using  the 
parallel  processing  simulator  referred  to  earlier  and  exhibit  the  organization 
achieved  in  the  input  data  when  processed  according  to  the  recursive  syntactic 
schemata  outlined  in  this  section.   The  close  conformity  to  the  Gestalt  principles 
in  the  "perceived"  organizations  should  be  considered  as  a  convincing  demonstration 
of  the  intrinsic  potentialities  of  syntactic  labeling  schemes  to  serve  as 
explanatory  models  for  visual  perception. 


5.  Ambiguous  Figures 

So  far,  we  have  been  concerned  with  the  perceptual  organization  of  elements 
in  the  visual  field  into  structured  subpictures.  But  a  much  more  remarkable 
feature  of  the  visual  process  is  that  certain  stimulus  configurations  tend  to 
give  rise  to  multiply -perceived  organizations.   These  are  the  so-called  ambiguous 
figures,  sometimes  also  referred  to  as  reversible  figures  when  the  alternatives 
are  two  in  number.  A  variety  of  such  patterns  have  been  discussed  in  the 
literature.   In  this  section  we  shall  see  that  the  occurrence  of  ambiguous 
figures  is  most  readily  comprehended  in  terms  of  the  Syntactic  Model.  We  shall 
see  further  that  some  of  the  visual  phenomena  associated  with  these  figures  in  fact 
verify  some  of  the  results  one  would  be  led  to  infer  from  the  model. 
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Let  us  recall,  to  begin  -with,  that  a  Syntactic  Model  is  a  hierarchic 
labeling  scheme  where,  at  each  level  of  the  hierarchy,  labels  are  assigned  to  the 
input  elements  of  that  level  by  structuring  them  according  to  given  composition 
rules  (i.e.,  the  grammar  rules  associated  with  that  level).  Unless  explicit 
constraints  are  built  into  the  model  to  avoid  all  ambiguities,  clearly,  situations 
can  arise  where  a  given  set  of  input  elements  admit  of  more  than  one  grammatically 
valid  structuring.  Moreover,  in  general,  this  can  arise  at  any  level  of  the 
hierarchy.   It  would  perhaps  be  instructive  to  consider  a  few  illustrative 
examples  from  a  Syntactic  Model  with  which  everyone  is  familiar,  namely,  that 
for  the  natural  languages.   Consider  discourse  in  English,  to  be  specific. 

The  standard  model  for  describing  (or  equivalently,  analyzing)  the 
flux  of  English  discourse  is  to  label  it  into  phoneme  sequences,  morpheme  sequences, 
phrases,  sentences  and  so  on,  hierarchically,  according  to  well-defined  syntactic 
rules.   It  is  a  common  experience  that  ambiguities  in  labeling  often  arise.  We 
give  some  examples  below. 

The  following  two  valid  morpheme  sequences  are  easily  verified  as 
arising  from  the  same  (except  for  the  final-cj  underlying  phoneme  sequence:  .. 

The  good  candy  came  any  way. 
The  good  can  decay  many  ways. 

Ambiguities  at  the  phrase  structure  level  are  much  more  readily  come  by.   The 
following,  due  to  Chomsky,  is  probably  as  famous  as  any: 

They  are  flying  planes . 

It  is  important  to  note  two  points  here:   (i)  that  ambiguous  constructions  can 
occur;  (ii)  that  often  the  ambiguity  can  be  resolved  by  explicitly  exhibiting 
the  preferred  sense  by  bracketing.  For  instance,  the  following  bracketing,  in  the 
example  above,  makes  the  intended  sense  unambiguous: 

(They)   (are  (flying  planes)  )  . 


These  examples  are  quoted  by  J.  S.  Bruner  in  [3]  where  he  attributes 
them  to  G.  A.  Miller. 
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Chomsky  has,  however,  shown  that  for  the  description  of  English  sentences 
this  type  of  immediate  constituent  analysis  is  not  wholly  adequate  and  has 
extended  the  Syntactic  Model  "by  an  additional  level  which  he  has  called  the 
transformational  level.   Certain  sentence  constructions  are  now  to  be  described 
by  giving  their  transformational  "history"  from  specific  underlying  kernel 
sentences o   Some  ambiguous  constructions  are  most  readily  resolved  by  making  use 
of  this  device.   Consider,  for  example,  the  following  sentence: 

Visiting  relatives  can  be  a  nuisance  . 

One  can  quite  simply  indicate  the  preferred  connotation  by  saying,  the  above 
sentence  is  to  be  viewed  as  a  transform  of  the  sentence 

Visiting  relatives  are  a  nuisance  . 

The  other  sense  can  be  derived  from  the  underlying  construction 

Visiting  relatives  is  a  nuisance.. 

To  go  back  to  ambiguous  pictures,  it  is  our  contention  that  within  the 
framework  of  the  Syntactic  Model  for  describing  pictures  that  we  have  been  out- 
lining, the  occurrence  of  ambiguous  pictures  and  their  resolution  should  be 
construed  in  a  manner  quite  analagous  to  the  linguistic  examples  considered  above. 
Let  us  try  to  substantiate  this  claim  by  considering  three  specific  examples. 

It  will  be  recalled  that  in  the  labeling  scheme,  at  the  very  lowest 
level,  points  are  assigned  the  labels; black;  white.   In  all  our  discussions  so 
far  we  have  tacitly  assumed  that  the  figure  is  composed  of  the  black  points  and 
have  been  concerned  with  their  subsequent  labeling  exclusively.   It  is  clear 
that  a  figure  could  equally  well  be  defined  by  the  white  points  and,  in  particular 
instances,  both  sets  of  points  could  define  figures  and  hence  be  eligible  for 
labeling  at  all  higher  levels.   In  such  situations  ambiguities  are  likely  to  arise 
and  we  suggest  that  ambiguous  pictures  based  on  the  figure-ground  relationship  are 
of  this  type.  A  well-known  example  is  the  picture  consisting  of  a  vase  and  two 
profiles  illustrated  in  Fig.  J. 
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FIGURE    7-    VASE  6  2  PROFILES 


FIGURE    &      A  TRIPLY  AMBIGUOUS  FIGURE 
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Another  type  of  ambiguity  is  illustrated  in  Fig.  8.   This  picture, 
which  we  have  constructed,  is  triply  ambiguous.   It  can  be  viewed  as  two 
equilateral  triangles,  or  as  two  parallelograms,  or  as  two  hour-glass -like 
figures.  We  shall  consider  this  picture  in  greater  detail  presently,  but,  for  the 
moment,  suggest  that  the  type  of  ambiguity  illustrated  here  is  somewhat  analagous 
to  the  phrase -structure -level  ambiguity  in  the  language  model. 

Finally,  we  have  constructed  in  Fig.  9«a>  an  ambiguous  picture  which 
is  a  variant  of  the  original  "My  wife  -  My  mother-in-law"  picture  introduced  by 
Boring  [2].  This  variant,  except  for  its  'streamlined'  aspects,  has  the  same 
structural  ambiguity  as  Boring's  original  picture.  We  suggest  that  ambiguities  of 
this  type  are  most  readily  comprehended  (and  hence  resolved)  by  considering  the 
picture  as  transforms  of  certain  underlying  unambiguous  pictures.   Figures  9-b  and 
9»c  show  the  underlying  pictures  in  terms  of  which  the  ambiguity  implicit  in  Fig.  9»a 
can  be  resolved.   It  is  our  view  that  the  ambiguity  associated  with  the  well-known 
reversible  cube  is  also  best  understood  along  these  lines. 

We  have  seen  earlier  that  in  certain  cases --especially  at  the  phrase 
structure  level- -the  preferred  construction  can  be  explicitly  exhibited  by 
auxiliary  brackets.  By  analogy  it  would  seem  that  the  same  kind  of  resolution 
should  be  possible  with  pictures.  Recalling  that  according  to  Gestalt  principles 
similar  figures  tend  to  be  organized  together  (see  Section  3)>  it  is  easy  to  check 
this  conjecture.   In  Fig.  10  we  have  redrawn  the  ambiguous  picture  of  Fig.  8 
but  explicitly  imposing  a  preferred  phrase  structure  for  two  of  the  lines.   It 
is  evident  that  this  imposed  organization  is  consistent  with  only  one  of  the  three 
possible  "readings,"  namely,  that  into  two  parallelograms.   Comparing  Figs.  8  and 
10,  it  does  seem  that  in  the  latter  the  two -parallelograms  aspect  is  much  more 
stable.  That  the  same  arguments  can  be  extended  even  when  the  ambiguous  picture 
consists  of  dots  instead  of  lines  is  verified  by  comparing  Figs.  11. a  and  11. b. 
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FIGURE    9.a 


FIGURE    9.b 


FIGURE   9.c 


FIGURE    9=      PORTRAIT  OF  TWO  WOMEN 
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6.   Some  Further  Implications  of  the  Syntactic  Model 

As  was  mentioned  at  the  start  of  this  paper,  our  ultimate  goal  is  the 
construction  of  a  phenomenalogical  model  in  terms  of  which  known  psychophysical 
phenomena  associated  with  the  visual  process  could  he  adequately  described  and 
studied.  As  a  first  step  towards  this  end  we  have  been  concerned  here  with  two 
aspects  of  the  visual  process  which  have  received  great  attention  in  the  psychological 
literature:   the  Gestalt  qualitat  of  the  data  in  the  visual  field  and  the  occurrence 
of  ambiguous  figures.   Our  point  of  departure  was  a,  certain  Syntactic  Model  for 
picture  processing  which  was  originally  developed  to  answer  a  very  specific  problem 
in  pattern  analysis  and  description.   In  the  foregoing  pages,  we  have  shown  that  a 
very  simple  extension  of  this  model  does  provide  an  adequate  and  intuitively 
plausible  basis  for  an  understanding  of  these  two  aspects  of  the  visual  process. 
In  this  last  section,  we  should  like  to  consider  some  of  the  implications  and 
secondary  features  of  a  Syntactic  Model  such  as  the  one  we  have  proposed  and 
see  to  what  extent  they  conform  to  results  obtained  in  studies  on  visual  perception. 

1.   Any  syntax  analysis  is  necessarily  a  sequential  processing  scheme. 

In  the  hierarchic  labeling  model,  for  instance,  the  labeling  process  at 
any  level,  in  general,  has  to  wait  for  the  completion  of  the  labeling 
process  in  the  preceding  levels.  This  would  imply  that  if  the  analysis 
is  time-limited,  only  the  initial  stages  of  labeling  could  be  realized. 
The  known  results  from  tachistoscopic  studies  on  organization  of  visual 
data  would  seem  to  be  entirely  in  conformity  with  this  expectation. 
They  seem  to  add  up  to  the  following  observations:   perceptual 
organization  takes  time;  that  it  is  a  temporal  process.  There  is  a 
primary  level  of  perception  at  which  no  grouping  occurs.   Further- 
more, there  is  a  direction  to  this  developmental  process.   Organization 
proceeds  from  the  simple  to  the  complex.   (See,  for  example,  the 
experiments  of  Krech  and  Calvin  quoted  by  Dember  [k]   and  those  of 
Oberly,  Bobbit,  etc.,  quoted  by  Vernon  [10].)   It  should  be  of 
considerable  interest  to  make  tachistoscopic  studies  using  pictures  of  the 
type  illustrated  in  Fig.  9  (see  below  under  (3)  for  further  related 
discussion) . 
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A  second  factor  that  plays  an  essential  role  in  the  labeling  scheme  is 
the  grammar „  We  have  defined  a  grammar  as  "being  associated  with 
a  class  of  pictures  and  as  something  given  prior  to  the  labeling. 
This  would  seem  to  imply  that  before  a  picture  can  be  labeled  it 
must  be  assigned  to  some  specific  class  of  pictures.   We  suggest 
that  this  is  entirely  in  keeping  with  the  notion  of  'set*  as  used 
in  the  psychology  of  perception. 

Actually,  going  a  step  further,  we  should  like  to  point  out  that  a 
Syntactic  Model  for  visual  perception  provides  a  very  fruitful  basis 
to  resolve  the  conflict  between  empiristic  and  organizational 
theories  in  regard  to  form  perception  (see,  e.g.,  the  comprehensive 
review  by  Zuckerman  and  Rock  [ll]).   Structural  linguists  distinguish 
between  'obligatory'  and  'optional'  rules  in  syntax  analysis.   It 
would  seem  entirely  appropriate  to  introduce  (or,  equivalently, 
look  for)  the  same  types  of  distinctions  in  the  labeling  schemata. 
Labeling  rules  which  apply  quite  independently  of  the  input  picture 
classification  are  precisely  those  which  one  would  characterize  as 
being  intrinsic  to  the  process  itself  (i.e.,  as  the  innate  organi- 
zational features )j  on  the  other  hand,  rules  based  on  grammar 
associated  with  particular  classes  would  be  the  optional  ones  and  would 
presumably  depend  on  past  experience,  in  so  far  as  grammar  is  a  measure 
of  achieved  learning  based  on  prior  exposure  to  the  particular  class  of 
pictures  or  on  knowledge  of  the  process  which  generates  the  pictures. 


It  is  not  necessary  to  suppose  that  this  assignment  to  a  class  of  pictures 
is  a  rigid  procedure  which  determines  subsequent  labeling  once  and  for  all. 
It  is  much  more  reasonable  to  assume  that  classification  is  also  a  dynamic 
procedure  and  that,  in  fact,  there  is  bound  to  be  a  considerable  amount  of 
feedback  between  labeling  and  classification.  For  our  immediate  purpose 
all  that  we  argue  is  that,  at  any  given  level,  grammar -dependent  labeling 
necessarily  presupposes  some  classification  of  the  input  picture  at  that 
level. 

Parenthetically,  it  may  be  remarked  here  that  we  do  not  believe  that, in  actual 
perception, labeling  is  always  carried  through  to  its  completion  over  all  the 
hierarchical  levels.   It  is  extremely  likely  that  labeling  is  terminated  as 
soon  as  the  description,  sufficient  for  the  purposes  on  hand,  has  been 
achieved.  A  good  part  of  the  occasional  "MISREADING"  of  a  picture  is 
understandable  along  these  lines . 
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3„   The  remarks  on  decomposition  into  subpictures  in  Section  k   show 
that  processing  according  to  the  syntax  model  could  become  fairly- 
complex  if  the  original  configuration  in  the  visual  field  cannot 
he  described  in  terms  of  a  single  figure  and  ground.   It  should  be 
highly  instructive  to  study  to  what  extent  this  is  borne  out  in 
actual  perception o  Figure  12,  although  quite  straightforward  in  its 
organization,  indicates  that  its  visual  comprehension  does  require 
great  effort.  Tachistoscopic  studies  on  similar  pictures  should 
throw  considerable  light  on  the  sequential  nature  of  the  organization 
process. 

If  one  accepts  the  plausibility  of  a  syntactic  scheme — in  some  such  version 
as  we  have  indicated  in  this  paper- -as  a  metatheoretically  appropriate  framework 
for  the  description  and  study  of  the  visual  process,  it  would  seem  that  for  the 
next  step  in  the  development  of  a  model  it  is  necessary  to  be  able  to  answer 
questions  of  the  following  kind:   How  much  of  the  labeling  is  done  in  parallel 
and  how  much  serially?  Does  scanning  the  visual  field  play  an  intrinsic  role 
in  labeling?   If  so,  at  what  level?  At  any  given  instant,  is  it  possible  for 
different  parts  in  the  visual  field  to  have  reached  different  levels  in  the 
labeling  hierarchy?   Is  it  possible  to  label  simultaneously  non-overlapping 
fields  using  different  grammars?  Can  this  be  done  even  if  the  fields  overlap? 


Such  visual  perceptual  situations  as  these  occur  rather  familiarly 
while  viewing  motion  pictures.   In  some  of  the  more  sophisticated 
productions,  quite  often  the  credit  titles  are  shown  superimposed 
on  the  initial  segment  of  the  main  picture  sequence. 
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Appendix  1 
(in  Collaboration  With  J.  P.  Fornango) 


We  give  here  two  examples  of  input  pictures  processed  according  to  the 
extended  Syntactic  Model  (i.e.,  the  recursive  labeling  scheme)  described  in 
Section  K.     The  illustrations  shown  are  actual  outputs  from  an  IBM-7090  computer. 
The  Syntactic  Model  was  simulated  using  PAX,,  a  general  purpose  parallel  processing 
simulator  that  has  been  written  for  this  computer  [9].   (For  a  detailed  account  of 
the  nature  of  computations  involved  in  parallel  processing  and  for  the  system 
organization  of  a  parallel  processing  computer,  see  18],) 

The  design  of  the  simulator  limits  the  extent  of  the  input  'visual1 
field  to  a  mosaic  of  72  x  J2   points.  As  will  be  seen  from  the  two  input  pictures 
shown  (Figures  13 .a  and  13.e),  this  resolution  is  not  fine  enough  for  a  proper 
delineation  of  the  squares  and  diamonds  which  make  up  the  organizations  in  the 
field.  Hence,  in  the  first  cycle  of  recursion,  we  used  a  rather  ad  hoc  labeling 
scheme  to  separate  out  the  subpictures  (shown  in  Figs.  13.b  and  13. d,  respectively). 
More  general  labeling  algorithms  can  be  constructed  to  identify  squares,  circles, 
triangles  and  other  familiar  geometrical  figures.  A  labeling  scheme  based  on  these 
could,  of  course,  be  used  if  the  resolution  of  the  input  field  were  sufficiently 
fine. 

In  smearing  the  subpictures,  the  quantitative  measure  used  was  to  replace 
each  black  point  by  a  set  of  nine  points,  viz.,  the  original  point  and  its  eight 
immediate  neighbors  in  the  mosaic. 

For  the  second  cycle  in  the  recursion,  these  two  new  subpictures  should 
have  been  labeled  using  a  recognition  program  for  alphanumeric  characters.  We 
did  not  have  such  a  program  readily  available  to  us.   So  the  labeling  was  done 
using  a  scheme  that  has  been  devised  for  processing  bubble  chamber  pictures 
(for  details,  see  [6]).   This  labeling  algorithm  is  set  up  to  convert  the  input 
picture  into  a  labeled  graph.   It  accomplishes  this  by  first  thinning  the  roads  in 
the  input  picture  suitably  and  then  assigning  directional  labels  to  the  road  segments 
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The  directional  labels  assigned  as  N,  E,  A,  B,  as  explained  in  the  tabulation  in 
Section  2.  Junctions,  bends,  etc.,  where  two  or  more  road  segments  meet,  are 
identified  by  the  multiple  labels  assigned  to  them.   In  the  computer  outputs 
shown,  the  following  code  has  been  used: 


E,A:    3 

A,B:   0 

E,N,B:   k 

E,N:    5 

N,B:   2 

A,N,B:   8 

E,Bs   9 

E,A,N:   7 

E,A,N,B:   U 

A,N:   6 

E,A,B:    1 

Points  not  assigned  any  of  these  labels  are  referred  to  as  'nulls'  and  are  shown  in 
the  outputs  by  asterisks  (*).  A  region  in  the  field  consisting  entirely  of  nulls 
would  be  interpreted  as  an  undifferentiated  'black'  area. 

The  separation  into  'Figure'  and  'Ground'  after  the  second  labeling  cycle 
is  clearly  evident  in  Figs.  1^  and  15.  Further  smearing  and  labeling  does  not 
generate  any  new  subpictures  and  so  the  processing  ends  here. 
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FIG.  15 
(input  at  in  13c) 
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